Clustering Inside Classes Improves Performance of Linear Classifiers

ثبت نشده
چکیده

This work systematically examines a Clustering Inside Classes (CIC) approach to classification. In CIC, each class is partitioned into subclasses based on cluster analysis. We find that CIC, by extracting local structure and producing compact subclasses, can improve performance of linear classifiers. It is compared against a global classifier on four benchmark datasets, using some of the most popular classification methods such as regularized logistic regression and linear Support Vector Machines. We discuss and empirically analyze the effect of the training set size and the number of clusters per class on the results of the CIC approach. We also examine use of an automated method for selecting the number of clusters for each class.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

On Using a Pre-clustering Technique to Optimize LDA-Based Classifiers for Appearance-Based Face Recognition

Fisher’s Linear Discriminant Analysis (LDA) is a traditional dimensionality reduction method that has been proven to be successful for decades. To enhance the LDA’s power for high-dimensional pattern classification, such as face recognition, numerous LDA-extension approaches have been proposed in the literature. This paper proposes a new method that improves the performance of LDA-based classif...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

A Hybrid Framework for Building an Efficient Incremental Intrusion Detection System

In this paper, a boosting-based incremental hybrid intrusion detection system is introduced. This system combines incremental misuse detection and incremental anomaly detection. We use boosting ensemble of weak classifiers to implement misuse intrusion detection system. It can identify new classes types of intrusions that do not exist in the training dataset for incremental misuse detection. As...

متن کامل

Context-sensitive intra-class clustering

This paper describes a new semi-supervised learning algorithm for intra-class clustering (ICC). ICC partitions each class into sub-classes in order to minimize overlap across clusters from different classes. This is achieved by allowing partitioning of a certain class to be assisted by data points from other classes in a context-dependent fashion. The result is that overlap across sub-classes (...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008